November 23, 2022

Data

Data Introduction

  • Gathered from the National Bureau of Economic Research, who creates and distributes a dataset of US mortality for every year since 1959
  • Each row represents a single death, while each column represents a different demographic characteristic of the deceased.
  • Used the 2019 edition of the dataset since we did not want to focus on COVID-19
  • Important information includes education, sex, age classification, day of month, place of death, weekday, manner of death, cause of death, and different ailments for each deceased individual

Secondary Dataset

  • Using Behavioral Risk Factor Surveillance System Survey
  • Includes different free text survey questions from across the United States and territories with responses broken out by subgroup
    • Includes questions on demographic characteristics, plus queries on current health behaviors, such as tobacco use and seatbelt use
  • Combined the secondary dataset by matching up subgroups between the death dataset and the risk factor dataset
    • Tried to use aggregate statistics to analyze how risk factors can be matched with causes of death. ## Given someone is dead, how did they die?
  • Since we know that all of the people accounted for in this data are dead, we want to see if we can examine factors such as an individual’s:
    • Age
    • Gender
    • Place of death
    • Educational Level
    • Health Conditions
    • Race

What explains trends or irregularities in mortality when looking at different data factors?

Exploration

caption

caption

caption

caption

caption

Table

cause 8th grade or less 9 - 12th grade, no diploma Associate degree Bachelor’s degree Doctorate or professional degree high school graduate or GED completed Master’s degree some college credit, but no degree Unknown
Alzheimer’s disease 0.0501966 0.0358828 0.0382503 0.0475508 0.0480320 0.0423834 0.0510434 0.0381342 0.0286979
Assault (homicide) 0.0059129 0.0161088 0.0038841 0.0022317 0.0014330 0.0072094 0.0018295 0.0059730 0.0076438
Atherosclerosis 0.0017687 0.0014094 0.0012893 0.0015848 0.0017679 0.0015421 0.0015769 0.0015382 0.0016092
Influenza and pneumonia 0.0219389 0.0179379 0.0160672 0.0162589 0.0178096 0.0173171 0.0163816 0.0158468 0.0197130
Leukemia 0.0064566 0.0058755 0.0097400 0.0110452 0.0129525 0.0073617 0.0117504 0.0087811 0.0047510
Sudden infant death syndrome 0.0041554 NA NA NA NA NA NA NA 0.0025863

Killer Plot

Killer Plot

Analysis

Free Text

80%

TODO: fix random forest. 80%